Using Machine Learning to Analyze Biological Macromolecular Crystallization Data
نویسندگان
چکیده
The crystallization of a new macromolecule is still very much a trial and error process. In an e ort to uncover useful trends in the crystallization of new macromolecules, Samudzi, Fivash and Rosenberg[12] performed a cluster analysis on the Biological Macromolecule Crystallization Database(BMCD)[7]. The crystallization parameters that were studied in order to di erentiate among the experiments were a subset of the BMCD parameters: pH, temperature, molecular weight, macromolecular concentration, precipitant type and crystallization method. Samudzi et al. performed a purely statistical analysis of the data, and reported the clusters by eye-balling the results. We have attempted to recreate their clusters using two di erent methods SAS clustering (same as Samudzi's) and COBWEB (a machine learning and discovery program). We then applied RL, an inductive learning program, to the discovered clusters from each of the methods, and veri ed as well as expanded on the Samudzi results. Apart from using clusters as the data input to RL, we also used RL on the entire BMCD data in an attempt to learn interesting correlations among the various crystallization parameters. From the point of view of crystallography, we have discovered possibly signi cant new empirical relationships. From a machine learning perspective, our work has led to the re nement of existing methods for incorporating detailed domain knowledge into inductive analysis techniques. In this paper we report these initial experiments and ndings from applying RL to the BMCD as well as the Samudzi and COBWEB clusters. This research is supported in part by funds from the W.M. Keck Center for Advanced Training in Computational Biology at the University of Pittsburgh, Carnegie Mellon University, and the Pittsburgh Supercomputing Center.
منابع مشابه
Induction of Rules for Biological Macromolecular Crystallization
X-ray crystallography is the method of choice for determining the 3-D structure of large macromolecules at a high enough resolution. The rate limiting step in structure determination is the crystallization itself. It takes anywhere between a few weeks to several years to obtain macromolecular crystals that yield good diffraction patterns. The theory of forces that promote and maintain crystal g...
متن کاملClassification of crystallization outcomes using deep convolutional neural networks
The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups. Here, state-of-the-art machine learning algorithms are trained and tested on different parts of this data set. We find that more than 94% of the test images can be correctly labeled, irrespecti...
متن کاملBiological Nano-crystallization ▶Macromolecular Crystallization Using Nano- volumes Biological Photonic Structures
متن کامل
Modeling of Chloride Ion Separation by Nanofiltration Using Machine Learning Techniques
In this work, several machine learning techniques are presented for nanofiltration modeling. According to the results, specific errors are defined. The rejection due to Nanofiltration increases with pressure but decreases with increasing the concentration of chloride ion. Methods of machine learning represent the rejection of nanofiltration as a function of concentration, pH, pressure and also ...
متن کاملComputer Aided Knowledge Discovery in Biomedicine
This chapter provides a perspective on 3 important collaborative areas in systems biology research. These areas represent biological problems of clinical significance. The first area deals with macromolecular crystallization, which is a crucial step in protein structure determination. The second area deals with proteomic biomarker discovery from high-throughput mass spectral technologies; while...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007